Histogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly Detection Algorithm

نویسندگان

  • Markus Goldstein
  • Andreas Dengel
چکیده

Unsupervised anomaly detection is the process of nding outliers in data sets without prior training. In this paper, a histogrambased outlier detection (HBOS) algorithm is presented, which scores records in linear time. It assumes independence of the features making it much faster than multivariate approaches at the cost of less precision. A comparative evaluation on three UCI data sets and 10 standard algorithms show, that it can detect global outliers as reliable as state-of-theart algorithms, but it performs poor on local outlier problems. HBOS is in our experiments up to 5 times faster than clustering based algorithms and up to 7 times faster than nearest-neighbor based methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier Detection on Mixed-Type Data: An Energy-Based Approach

Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. I...

متن کامل

A multi-step outlier-based anomaly detection approach to network-wide traffic

Outlier detection is of considerable interest in fields such as physical sciences, medical diagnosis, surveillance detection, fraud detection and network anomaly detection. The data mining and network management research communities are interested in improving existing score-based network traffic anomaly detection techniques because of ample scopes to increase performance. In this paper, we pre...

متن کامل

A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection

Outlier removal from training data is a classical problem in pattern recognition. Nowadays, this problem becomes more important for large-scale datasets by the following two reasons: First, we will have a higher risk of “unexpected” outliers, such as mislabeled training data. Second, a large-scale dataset makes it more difficult to grasp the distribution of outliers. On the other hand, many uns...

متن کامل

Comparison of Unsupervised Anomaly Detection Techniques

Anomaly Detection is the process of finding outlying record from a given data set. This problem has been of increasing importance due to the increase in the size of data and the need to efficiently extract those outlying records which could indicate unauthorized access of the system, credit card theft or the diagnosis of a disease. The aim of this bachelor thesis is to implement a RapidMiner ex...

متن کامل

Outlier Detection on High Dimensional Data Using RNN

Background: Outlier detection is an important factor in data mining since it is used in various real time applications. Outlier is an extreme points that are not related to any of the class. Dealing with dimensions is the great challenge, due to “curse of dimensionality”, for effective outlier detection. In a high dimensional data space, it is difficult to detect most related points and most un...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012